25 research outputs found

    Autonomous learning of multiple skills through intrinsic motivations: A study with computational embodied models

    Get PDF
    The experimental works presented in this thesis have been carried out and published together with my supervisor Marco Mirolli and with Gianluca Baldassarre. In particular, chapter 3 is adapted from (Mirolli et al. 2013, Santucci et al. 2014b, 2010, 2012a); chapter 4 is adapted from (Santucci et al. 2013b,a); chapter 5 is adapted from (Santucci et al. 2014a); and chapter 6 is adapted from (Santucci et al. 2016). Moreover, (Santucci et al. 2012b) focused on the same topics of the research described in chapter 4. Since the experiments presented in that paper constitute preliminary results obtained in a simplified experimental scenario, they have not been included in this thesis. However, the insights provided by that work have been used in the research presented in chapter 4.Developing artificial agents able to autonomously discover new goals, to select them and learn the related skills is an important challenge for robotics. This becomes even crucial if we want robots to interact with real environments where they have to face many unpredictable problems and where it is not clear which skills will be the more suitable to solve them. The ability to learn and store multiple skills in order to use them when required is one of the main characteristics of biological agents: forming ample repertoires of actions is important to widen the possibility for an agent to better adapt to different environments and to improve its chance of survival and reproduction. Moreover, humans and other mammals explore the environment and learn new skills not only on the basis of reward-related stimuli but also on the basis of novel or unexpected neutral stimuli. The mechanisms related to this kind of learning processes have been studied under the heading of “Intrinsic Motivations” (IMs), and in the last decades the concept of IMs have been used in developmental and autonomous robotics to foster an artificial curiosity that can improve the autonomy and versatility of artificial agents. In the research presented in this thesis I focus on the development of open-ended learning robots able to autonomously discover interesting events in the environment and autonomously learn the skills necessary to reproduce those events. In particular, this research focuses on the role that IMs can play in fostering those processes and in improving the autonomy and versatility of artificial agents. Taking inspiration from recent and past research in this field, I tackle some of the interesting open challenges related to IMs and to the implementation of intrinsically motivated robots. I first focus on the neurophysiology underlying IM learning signals, and in particular on the relations between IMs and phasic dopamine (DA). With the support of a first computational model, I propose a new hypothesis that addresses the dispute over the nature and the functions of phasic DA activations: reconciling two contrasting theories in the literature and taking xi into account the different experimental data, I suggest that phasic DA can be considered as a reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). The results obtained with my computational model support the presented hypothesis, showing how such a learning signal can serve two important functions: driving both the discovery and acquisition of novel actions and the maximisation of rewards. Moreover, those results provide a first example of the power of IMs to guide artificial agents in the cumulative learning of complex behaviours that would not be learnt simply providing a direct reward for the final tasks. In a second work, I move to investigate the issues related to the implementation of IMs signal in robots. Since the literature still lacks a specific analysis of which is the best IM signal to drive skill acquisition, I compare in a robotic setup different typologies of IMs, as well as the different mechanisms used to implement them. The results provide two important contributions: 1) they show how IM signals based on the competence of the system are able to generate a better guidance for skill acquisition with respect to the signals based on the knowledge of the agent; 2) they identify a proper mechanism to generate a competence-based IM signal, showing that the stronger the link between the IM signal and the competence of the system, the better the performance. Following the aim of widening the autonomy and the versatility of artificial agents, in a third work I focus on the improvement of the control architecture of the robot. I build a new 3-level architecture that allows the system to select the goals to pursue, to search for the best way to achieve them, and acquire the related skills. I implement this architecture in a simulated iCub robot and test it in a 3D experimental scenario where the agent has to learn, on the basis of IMs, a reaching task where it is not clear which arm of the robot is the most suitable to reach the different targets. The performance of the system is compared to the one of my previous 2-level architecture system, where tasks and computational resources are associated at design time. The better performance of the system endowed with the new 3-level architecture highlights the importance of developing robots with different levels of autonomy, and in particular both the high-level of goal selection and the low-level of motor control. Finally, I focus on a crucial issue for autonomous robotics: the development of a system that is able not only to select its own goals, but also to discover them through the interaction with the environment. In the last work I present GRAIL, a Goal-discovering Robotic Architecture for Intrisically-motivated Learning. Building on the insights provided by my previous research, GRAIL is a 4-level hierarchical architecture that for the first time assembles in unique system different features necessary for the development of truly autonomous robots. GRAIL is able to autonomously 1) discover new goals, 2) create and store representations of the events associated to those goals, 3) select the goal to pursue, 4) select the computational resources to learn to achieve the desired goal, and 5) self-generate its own learning signals on the basis of the achievement of the selected goals. I implement GRAIL in a simulated iCub and test it in three different 3D experimental setup, comparing its performance to my previous systems, showing its capacity to generate new goals in unknown scenarios, and testing its ability to cope with stochastic environments. The experiments highlight on the one hand the importance of an appropriate hierarchical architecture for supporting the development of autonomous robots, and on the other hand how IMs (together with goals) can play a crucial role in the autonomous learning of multiple skills

    Cumulative learning through intrinsic reinforcements

    Get PDF
    Building artificial agents able to autonomously learn new skills and to easily adapt in different and complex environments is an important goal for robotics and machine learning. We propose that providing reinforcement learning artificial agents with a learning signal that resembles the charac- teristic of the phasic activations of dopaminergic neurons would be an advancement in the development of more autonomous and versatile systems. In particular, we suggest that the particular composition of such a signal, determined by both extrinsic and intrinsic reinforcements, would be suitable to improve the implementation of cumulative learning in artificial agents. To validate our hypothesis we performed experiments with a simulated robotic system that has to learn different skills to obtain extrinsic rewards. We compare different versions of the system varying the composition of the learning signal and we show that the only system able to reach high performance in the task is the one that implements the learning signal suggested by our hypothesis

    Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study

    Get PDF
    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions

    Which is the best intrinsic motivation signal for learning multiple skills?

    Get PDF
    Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance

    Intrinsic motivation signals for driving the acquisition of multiple tasks: A simulated robotic study

    Get PDF
    Intrinsic Motivations (i.e motivations not connected to rewardrelated stimuli) drive humans and other biological agents to autonomously learn different skills in absence of any biological pressure or any assigned task. In this paper we investigate which is the best learning signal for driving the training of different tasks in a modular architecture controlling a simulated kinematic robotic arm that has to reach for different objects. We compare the performance of the system varying the Intrinsic Motivation signal and we show how a Task Predictor whose learning process is strictly connected to the competence of the system in the tasks is able to generate the most suitable signal for the autonomous learning of multiple skills

    Autonomous selection of the "what" and the "how" of learning: an intrinsically motivated system tested with a two armed robot

    Get PDF
    In our previous research we focused on the role of Intrinsically motivated learning signals in driving the selection and learning of different skills. This work makes a further step towards more autonomous and versatile robots, implementing a 3-level hierarchical architecture with the mechanisms necessary to both select goals to pursue and search for the best way to achieve them. In particular, we focus on the important problem of providing artificial agents with a decoupled architecture that separates the selection of goals from the selection of resources. To verify our solution, we use the architecture to control the two redundant arms of a simulated iCub robotic platform tested in a reaching task within a 3D environment. We compare its performance to a previous model having a coupled architecture where the different goals are associated at design-time to different modules pursuing them
    corecore